179 research outputs found

    Anterior Prefrontal Cortex Contributes to Action Selection through Tracking of Recent Reward Trends

    Get PDF
    The functions of prefrontal cortex remain enigmatic, especially for its anterior sectors, putatively ranging from planning to self-initiated behavior, social cognition, task switching, and memory. A predominant current theory regarding the most anterior sector, the frontopolar cortex (FPC), is that it is involved in exploring alternative courses of action, but the detailed causal mechanisms remain unknown. Here we investigated this issue using the lesion method, together with a novel model-based analysis. Eight patients with anterior prefrontal brain lesions including the FPC performed a four-armed bandit task known from neuroimaging studies to activate the FPC. Model-based analyses of learning demonstrated a selective deficit in the ability to extrapolate the most recent trend, despite an intact general ability to learn from past rewards. Whereas both brain-damaged and healthy controls used comparisons between the two most recent choice outcomes to infer trends that influenced their decision about the next choice, the group with anterior prefrontal lesions showed a complete absence of this component and instead based their choice entirely on the cumulative reward history. Given that the FPC is thought to be the most evolutionarily recent expansion of primate prefrontal cortex, we suggest that its function may reflect uniquely human adaptations to select and update models of reward contingency in dynamic environments

    Independent neural computation of value from other people's confidence

    Get PDF
    Expectation of reward can be shaped by the observation of actions and expressions of other people in one's environment. A person's apparent confidence in the likely reward of an action, for instance, makes qualities of their evidence, not observed directly, socially accessible. This strategy is computationally distinguished from associative learning methods that rely on direct observation, by its use of inference from indirect evidence. In twenty-three healthy human subjects, we isolated effects of first-hand experience, other people's choices, and the mediating effect of their confidence, on decision-making and neural correlates of value within ventromedial prefrontal cortex (vmPFC). Value derived from first hand experience and other people's choices (regardless of confidence) were indiscriminately represented across vmPFC. However, value computed from agent choices weighted by their associated confidence was represented with specificity for ventromedial area 10. This pattern corresponds to shifts of connectivity and overlapping cognitive processes along a posterior-anterior vmPFC axis. Task behavior and self-reported self-reliance for decision-making in other social contexts correlated. The tendency to conform in other social contexts corresponded to increased activation in cortical regions previously shown to respond to social conflict in proportion to subsequent conformity (Campbell-Meiklejohn et al., 2010). The tendency to self-monitor predicted a selectively enhanced response to accordance with others in the right temporoparietal junction (rTPJ). The findings anatomically decompose vmPFC value representations according to computational requirements and provide biological insight into the social transmission of preference and reassurance gained from the confidence of others. Significance Statement: Decades of research have provided evidence that the ventromedial prefrontal cortex (vmPFC) signals the satisfaction we expect from imminent actions. However, we have a surprisingly modest understanding of the organization of value across this substantial and varied region. This study finds that using cues of the reliability of other peoples'; knowledge to enhance expectation of personal success generates value correlates that are anatomically distinct from those concurrently computed from direct, personal experience. This suggests that representation of decision values in vmPFC is suborganized according to the underlying computation, consistent with what we know about the anatomical heterogeneity of the region. These results also provide insight into the observational learning process by which someone else's confidence can sway and reassure our choices

    Model-based learning protects against forming habits.

    Get PDF
    Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.This work was funded by a Sir Henry Wellcome Postdoctoral Fellowship (101521/Z/12/Z) awarded to C.M.G. ND is supported by a Scholar Award from the McDonnell FoundationThe authors report no conflicts of interest and declare no competing financial interests.This is the final published version. It first appeared at http://link.springer.com/article/10.3758%2Fs13415-015-0347-6

    Model based control can give rise to devaluation insensitive choice

    Get PDF
    Influential recent work aims to ground psychiatric dysfunction in the brain's basic computational mechanisms. For instance, the compulsive symptoms that feature prominently in drug abuse and addiction have been argued to arise from over reliance on a habitual “model-free” system in contrast to a more laborious “model-based” system. Support for this account comes in part from failures to appropriately change behavior in light of new events. Notably, instrumental responding can, in some circumstances, persist despite reinforcer devaluation, perhaps reflecting control by model-free mechanisms that are driven by past reinforcement rather than knowledge of the (now devalued) outcome. However, another line of theory posits a different mechanism – latent causal inference – that can modulate behavioral change. It concerns how animals identify different contingencies that apply in different circumstances, by covertly clustering experiences into distinct groups. Here we combine both lines of theory to investigate the consequences of latent cause inference on instrumental sensitivity to reinforcer devaluation. We show that instrumental insensitivity to reinforcer devaluation can arise in this theory even using only model-based planning, and does not require or imply any habitual, model-free component. These ersatz habits (like laboratory ones) emerge after overtraining, interact with contextual cues, and show preserved sensitivity to reinforcer devaluation on a separate consumption test, a standard control. Together, this work highlights the need for caution in using reinforcer devaluation procedures to rule in (or out) the contribution of different learning mechanisms and offers a new perspective on the neurocomputational substrates of drug abuse

    Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning

    Full text link
    The ability to acquire abstract knowledge is a hallmark of human intelligence and is believed by many to be one of the core differences between humans and neural network models. Agents can be endowed with an inductive bias towards abstraction through meta-learning, where they are trained on a distribution of tasks that share some abstract structure that can be learned and applied. However, because neural networks are hard to interpret, it can be difficult to tell whether agents have learned the underlying abstraction, or alternatively statistical patterns that are characteristic of that abstraction. In this work, we compare the performance of humans and agents in a meta-reinforcement learning paradigm in which tasks are generated from abstract rules. We define a novel methodology for building "task metamers" that closely match the statistics of the abstract tasks but use a different underlying generative process, and evaluate performance on both abstract and metamer tasks. In our first set of experiments, we found that humans perform better at abstract tasks than metamer tasks whereas a widely-used meta-reinforcement learning agent performs worse on the abstract tasks than the matched metamers. In a second set of experiments, we base the tasks on abstractions derived directly from empirically identified human priors. We utilize the same procedure to generate corresponding metamer tasks, and see the same double dissociation between humans and agents. This work provides a foundation for characterizing differences between humans and machine learning that can be used in future work towards developing machines with human-like behavior

    Tonic Dopamine Modulates Exploitation of Reward Learning

    Get PDF
    The impact of dopamine on adaptive behavior in a naturalistic environment is largely unexamined. Experimental work suggests that phasic dopamine is central to reinforcement learning whereas tonic dopamine may modulate performance without altering learning per se; however, this idea has not been developed formally or integrated with computational models of dopamine function. We quantitatively evaluate the role of tonic dopamine in these functions by studying the behavior of hyperdopaminergic DAT knockdown mice in an instrumental task in a semi-naturalistic homecage environment. In this “closed economy” paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently. Compared to wild-type mice, hyperdopaminergic mice allocate more lever presses on high-cost levers, thus working harder to earn a given amount of food and maintain their body weight. However, both groups show a similarly quick reaction to shifts in lever cost, suggesting that the hyperdominergic mice are not slower at detecting changes, as with a learning deficit. We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize. In these analyses, hyperdopaminergic mice displayed normal learning from recent reward history but diminished capacity to exploit this learning: a reduced coupling between choice and reward history. These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior

    Humans decompose tasks by trading off utility and computational cost

    Full text link
    Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions. How are these decompositions created and used? Here, we propose and evaluate a normative framework for task decomposition based on the simple idea that people decompose tasks to reduce the overall cost of planning while maintaining task performance. Analyzing 11,117 distinct graph-structured planning tasks, we find that our framework justifies several existing heuristics for task decomposition and makes predictions that can be distinguished from two alternative normative accounts. We report a behavioral study of task decomposition (N=806N=806) that uses 30 randomly sampled graphs, a larger and more diverse set than that of any previous behavioral study on this topic. We find that human responses are more consistent with our framework for task decomposition than alternative normative accounts and are most consistent with a heuristic -- betweenness centrality -- that is justified by our approach. Taken together, our results provide new theoretical insight into the computational principles underlying the intelligent structuring of goal-directed behavior

    A Dual Role for Prediction Error in Associative Learning

    Get PDF
    Confronted with a rich sensory environment, the brain must learn statistical regularities across sensory domains to construct causal models of the world. Here, we used functional magnetic resonance imaging and dynamic causal modeling (DCM) to furnish neurophysiological evidence that statistical associations are learnt, even when task-irrelevant. Subjects performed an audio-visual target-detection task while being exposed to distractor stimuli. Unknown to them, auditory distractors predicted the presence or absence of subsequent visual distractors. We modeled incidental learning of these associations using a Rescorla-Wagner (RW) model. Activity in primary visual cortex and putamen reflected learning-dependent surprise: these areas responded progressively more to unpredicted, and progressively less to predicted visual stimuli. Critically, this prediction-error response was observed even when the absence of a visual stimulus was surprising. We investigated the underlying mechanism by embedding the RW model into a DCM to show that auditory to visual connectivity changed significantly over time as a function of prediction error. Thus, consistent with predictive coding models of perception, associative learning is mediated by prediction-error dependent changes in connectivity. These results posit a dual role for prediction-error in encoding surprise and driving associative plasticit
    • …
    corecore